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Abstract 

As the number of possible predictors generated by high-throughput experiments 
continues to increase, methods are needed to quickly screen out unimportant covari- 
ates. Model-based screening methods have been proposed and theoretically justified, 
but only for a few specific models. Model-free screening methods have also recently 
been studied, but can have lower power to detect important covariates. In this paper 
we propose EEScreen, a screening procedure that can be used with any model that can 
be fit using estimating equations, and provide unified results on its finite-sample screen- 
ing performance. EEScreen thus generalizes many recently proposed model-based and 
model-free screening procedures. We also propose iEEScreen, an iterative version of 
EEScreen, and show that it is closely related to a recently studied boosting method 
for estimating equations. We show via simulations for two different estimating equa- 
tions that EEScreen and iEEScreen are useful and flexible screening procedures, and 
demonstrate our methods on data from a multiple myeloma study. 

Keywords: Estimating equations; Ultra-high-dimensional data; Sure independence 
screening; Variable selection 
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1 Introduction 



Modern high-throughput experiments are producing high-dimensional datasets with ex- 
tremely large numbers of covariates. Traditional regression modeling strategies work poorly 
in su ch situations, lead ing to recent interest in regularized regressi on methods such as the 



lasso (ITibshiranil . 



19961 ). the Dantzig selector ( ICandes and Tad . 



20071 ). and SCAD flFan and Li 



200lh . These procedures can perform well in estimation and prediction even when the num- 
ber of covariates Pn is larger than the sample size n, where here we are allowing pn to grow 
with n. However, when pn is extremely large compared to n, these methods can become inac- 



curate and computationally infeasible (IFan and Lv 



20081 ). Thus there is a need for methods 



to quickly screen out unimportant covariates before using regularization methods. 

A number of screening strategies have so far been proposed, and choosing which one to 
use depends on what model we believe is most suitable for the data. Under the ordinary 



linear model, iFan and Lvl (|2008[ ) proposed a procedure with the sure screening property, 
where the covariates retained after screening will contain the truly important covariates 
with probability appr oaching one, eyen in t he ul tra-high-dimensiona l realm where pn grows 



exponentially with n. 



Zhao and Lil (120121 ) subsequently proposed 



Fan and Song (120101 ) and 
procedures that maintain this property for generalized linear models and the Cox model, 
respectively . Scre ening methods have also been pr oposed for nonp arametric additive models 



( iFan et al 



20111). linear transformation mo dels (ILi et al. 



models (iGorst-Rasmussen and Scheikd. 



In a recent development. 



Zhu et al. 



201lh 



20111), and single- index hazard 



(120111 ) proposed a screening method valid for any 



single-index model, a class so large that their screening procedure is nearly model-free. 
They used a new measure of dependence which can detect a wide variety of functional 
relationships between the covariates and the outcome, and proved that their method has 
the sure screening property for any single-index model. They also showed in simulations 
that it could significantly outperform model-based screening methods when the models were 
incorrectly specified. 



On the other hand, model-based screening can have greater power to detect important 
covariates, a consequence of the bias-variance tradeoff. However, there are often situations 
where we wish to use some model other than the ones mentioned above. For example, 
studies involving clustered observations, missing data, or censored outcomes are frequently 
encountered in genomic medicine, and are often analyzed with more complicated regression 
models for which no screening methods have yet been developed. In theory it is not difficult 
to propose a screening procedure for any given model: fit p„ marginal regressions, one for 
each covariate, and retain those covariates with the largest marginal estimates, in absolute 
value. But fitting pn marginal regressions can still be time-consuming, especially if pn is 
very large and the fitting procedure is slow, and theoretical properties such as sure screening 
must still be studied on a case-by-case basis. 

In this paper we propose EEScreen, a unified approach to screening which can be used 
with any statistical model that can be fit using estimating equations. This is convenient 
because estimating equations are frequently used to analyze the previously mentioned cor- 
related, missing, or censored data situations. EEScreen is also fundamentally different from 
most other screening procedures in that it only requires evaluating p„ estimating equations 
at a fixed parameter value, rather than solving for pn marginal regressione estimates, making 
it exceedingly computationally convenient. We prove theoretical results about the screen- 
ing properties of EEScreen that hold for any model that can be fit using U-statistic-based 
estimating equations. 

Furthermore, because we can design estimating equations to incorporate more or fewer 
modeling assumptions, we can use our EEScreen framework to span the range between 
model-based and model-free screening. In particular, we sho w that EEScreen can provide 



a screening method very similar to that of 



ZhuetaL 



(120111 ) when used with a particular 



estimating equation. This estimating equation actually cannot be used for estimation in 
practice because it involves unknown parameters, but interestingly can still be used to derive 
a useful screening procedure. 
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Fan and Lvl (|2008[ ) suggested an iterative 



Finally, when covariates are highly correlated, 
version of their screening procedure, which they found to outperform marginal screening in 
some cases. In this paper we provide an iterative version of EEScreen (iEEScreen), and we 
also demonstrate a novel connection between iEEScreen and EEBoost, a recently proposed 
boost ing algorithm for estimation and variable selection in estimating equations (iWolfsoru . 



20111 ). This connection may provide a means for a theoretical analysis of iterative screening 



methods, something which so far has been difficult to study. 

We introduce EEScreen in Section |2l where we also give some examples, establish its 
theoretical properties, and briefly discuss how to choose the number of covariates to ret ain 

Zhu et all (120111 ) in 



after screening. We derive a new screening method similar to that of 
Section [31 and discuss iEEScreen in Section HJ We conduct a thorough simulation study in 
Section [5l using two different estimating equations, before applying our methods to analyze 
an issue in multiple myeloma in Section El We conclude with a discussion in Section [71 and 
provide proofs in the Appendix. 



2 EEScreen: sure screening for estimating equations 
2.1 Method 

Let Yi = {Yii, . . . , YiKi)'^ be a i^'i X 1 outcome vector and Xj = (Xji, . . . , Xj^J^ be a i^'j x p„ 
matrix of covariates for units i = 1, . . . , n. Then let Y = (Yi, . . . , Y„)"^ be a ^ . i^j x 1 
vector and X = (Xf , . . . , X^)-^ be a Ki x p„ matrix. Assuming some regression model, 
we can construct a p„ x 1 estimating equation U(/3) that depends on Yj and Xj such that 
E{U(/3q)} = 0, where /3q is the true Pn x 1 parameter vector. Let the set of true regression 
parameters Ai = {j : f3oj 7^ 0} have size |A^| = Sn, where (3oj is the j*'^ component of [3^. 
It is commonly assumed that s„ is small and fixed or growing slowly. When p„ < n, (3q is 
estimated by finding the ^ such that U(;9) = 0, but when pn > n t here are an infinite number 



of solutions for f3, in which case regularized regression is used (iFu 



2003 



Johnson et al 



2008 
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Wolfson 



201ll ). However, when p„ is much greater than n, these methods can lose accuracy 
and be too computationally demanding, hence the need for screening methods to quickly 
reduce pn- 

Most previously proposed screening methods proceed by fitting p„ regression models, 
one covariate at a time, to get p„ marginal estimates aj. They then retain the covariates 
with \aj\ above some threshold. This is akin to conducting p„ Wald tests, though without 
standardizing the aj by their variances. However, in the case of estimating equations, even 
this procedure can be time-consuming if pn is large or U is cumbersome to fit. 

Here, instead of marginal Wald tests, we construct marginal score tests for the (3oj using 
U. To motivate our procedure, we first consider the case where the marginal model is correct 
for /3oi. In other words, /3oi ^ while Pqj = for all j ^ 1. Then E[U{(/3oi, 0, . . . , 0)}] = 0, 
so that each component of U is a valid estimating equation for /3oi. This implies that each 
component of U(0) is the numerator of a score test for the null hypothesis Poi = 0. If the 
marginal model is correct for /3oi, then to achieve sure screening we must reject the score 
test. Therefore we use as our screening statistic the component of U(0) that gives the most 
powerful test, which we denote f/i(0). For each j, we can identify the component Uj{0) of 
U(0) that is most powerful for testing f3oj = under the marginal model that /3oj is the only 
non-zero parameter. In many situations the first component of U(0) will be associated with 
/3oi, the second with /3o2, and so on. When this is not the case, we can follow the construction 
above to relabel the components of U(0) appropriately. 

We propose using the relabeled Uj{0) as surrogate measures of association between the 
outcome and the j*^ covariate, after first standardizing the covariates to have equal variances. 
Instead of just taking the numerators of the score tests we could divide each Uj{0) by an 
estimate of its standard deviation, but this would add computational complexity to our 
procedure, and even without doing so we will be able to achieve good results and prove 
finite-sample performance guarantees. One advantage to using score tests is that they do not 
require parameter estimation and so are more computationally convenient than performing 
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Pn marginal regressions. Furthermore, this framework will also allow us to give a unified 
treatment of the theoretical results for a large class of estimating equations. 
Specifically, we propose the following screening procedure: 

1. Standardize the p„ covariates to have variance 1. 

2. For the j*'* parameter identify the marginal estimating equations Uj as described above. 

3. Set a threshold 7„. 

4. Retain the parameters {j : |t/j(0)| > 7„}. 

We will denote the set of retained parameters by Ai. Note that this procedure only re- 
quires evaluating pn estimating equations at 0, which can be computed very quickly. The 
convenience of score tests, however, comes at the price of ambiguity in the proper treatment 
of nuisance parameters, such as the intercept term in a regression model. Without loss of 
generality, let /3oi be the intercept term. We can first fit the intercept without any covariates 
in the model to get an estimate /3oi- This only needs to be done once, since /3oi will remain 
the same for each Uj. We then screen by evaluating each Uj at rj = (/3ni,Q) inst ead of at 0. 

Our score test idea was motivated by the EEBoost algorithm ( jWolfsonl . 1201 ll ). a boosting 
procedure for estimating equations which uses components of the estimating equation U as 
a surrogate measure of association. We therefore refer to our method as EEScreen, and we 
will draw more connections between EEScreen and EEBoost in Section HI 



2.2 Examples 

Here we provide some examples of EEScreen for various estimating equations, assuming 
throughout that E(Xj) = and var(Xy) = 1. For the linear model with Ki = 1, the 
usual linear regression score equation is U(/3) = X-^(Y — X/3), so U(0) = X-^Y. Under 
the marginal model that (3oj/ is the only non-zero parameter, the j^^ component of E{U(0)} 
equals cor(Xjj, Xy/)/3oj/, where Xij is the j^^ component of the i^^ covariate vector. Clearly 
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this is maximized when j = j' for any value of Poj/, so the component of U(0) that gives the 
most powerful test is Uj>{0). EEScreen then retains the parameters {j : | y^, XjjYjl > 7„,j. 
Note that this is equivalent to the original screening procedure proposed by 



Fan and Lv 



teoosh 



Under the Cox model, when Ki = 1 with survival outcomes, let Tj be the survival time, 
Ci the censoring time, Yi = min(Tj, C,), and 6i = I{Ti < Ci). The Cox model score equation 
is 

dMx). (1) 



Tirm-V^/Jx Er=iX.y.(x)exp(xf/3) ) 



where iVj(x) = /(Tj < x,6i) is the observed failure process and Yi{x) = /(Fj > x) is 
the at-risk process. Under the mar ginal model that Poji is the only non-zero parameter. 



Gorst-Rasmussen and Scheikd (120111 ) show that the largest component of the limiting esti- 
mating equation evaluated at is found for the j that maximizes j coT{Xij, F{t \ Xij/)}, 
where F{t \ Xiji) is the distribution function of Tj, conditional on Xij/. Thus the component 
of U(0) that gives the most powerful test is again the j'*^ component. EEScreen then retains 
the parameters 



i=l 



X,;, 



This is exactly the screening statistic of 







1 dNi{x) 


> In 



Gorst-Rasmussen and Scheikd ( 20 1 11) . This exa mple 



illustrates the computational advantages that EEScreen can enjoy. 



(2) 



Zhao and Li 



mm pro 



posed screening for the Cox model based on fitting marginal Cox regressions, which requires 



Pn app lications of the Newton- Raphson algorithm. In contrast, 



Gorst-Rasmussen and Scheike 



(I2OIII ) and EEScreen only require evaluating the Uj{0). 



The ordinary linear model and the Cox model have already been studied in the screening 
literature, but EEScreen is most useful for models for which no screening procedures exist 



yet. 



(jjung 



n Sect ions |5] we study its performance on two such models: a t-year s urviv al model 



19961 ) and the accelerated failure time model (ITsiatis 



1996 



Jin et al. 



20031 ). 
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2.3 Theoretical properties 

One advantage of our EEScreen framework is that we can provide very general theoretical 
guarantees on its screening performance that hold for a large class of models, without needing 
to study each model on a case-by-case basis. We require three assumptions on the marginal 
estimating equations Uj to prove that EEScreen has the sure screening property, where the 
probability that the retained parameters A4 contains the true parameters Ai approaches 
1. Let the expected full estimating equations be denoted u(/3) = E{U(/3)}, so that the 
expected marginal estimating equations are Uj{(3). 

Assumption 1 Let Xjj be the Ki x 1 vector of the j^^ covariate for the i^^ unit. Each 
estimating equation Uj has the form 

^.(/3)=Q E /^.{/3;(Y.„X,J,...,(Y,„,X,„)} (3) 

^ ^ l<h<...<im 

for all j , where n > m and hj is a real-valued kernel function that depends on (3 and is 
symmetric zn t/ie ( Yi^ , J , . . . , ( Yj^ , Xj„ ) . 

Assumption 2 There exist some constants b > and > such that for all j , \Uj{0) — 
u,{0)\ < b and var[/i,{0; (Y,„ X,J, . . . , (Y„„, X,„)}] < S^. 

Assumption [1] requires that each Uj be a U-statistic of order m, which encompasses a 
large number of important estimating equations. Assumption |2] amounts to conditions on 
the moments of the Uj, and they can often be satisfied by assuming bounded outcomes and 
covariates. These conditions are necessary for stating a Bernstein-type inequality for the 
Uj, which gives the probability bounds in Theorems [T] and [2J They can therefore be relaxed 
as long as there exists a similar p robability inequalit y for Uj. For example, Bernstein- type 



inequalities exist for martingales (Ivan de Geer 
model score equations. 



19951 ). which would allow Uj to be the Cox 
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Assumption 3 There exists some constant Ci > such that miiij^j^ \uj{0)\ > Ci[n/m\~'^ 
with < K < 1/2, where m is defined in AssumptionUl CLnd [n/rri\ is the largest integer less 
than n/m. 

Assumption [3] is an assumption on the marginal signal strengths of the covariates in 
M.. In EEScreen these signals are quantified by the Mj(0), and Assumption [3] requires them 
to be of at least a certain order so that they are detectable given a sample size n. An 
assumption of this type is always needed in a theoretical analysis of a screening procedure. 
For example, in the g eneralized linea r mode l setting, our Assumption [3] is exactly equivalent 



Fan and SongI (120101 ) that the magnitude of the covariance between 



to the assumption of 

E(Yj I Xj) and the j*^ covariate be of order n~'^. Since EEScreen is similar to conducting pn 
score tests. Assumption [3] is similar to requiring that the expected value of the marginal score 
test statistic for j G be of a certain order. As previously mentioned, we could standardize 
the screening statistic |mj(Q)| by its variance, in which case the score test analogy would be 
exact. It is very reasonable to use the marginal score test statistic as a proxy for the marginal 
association of the covariates. 

Under these assumptions, we can show that EEScreen possesses the sure screening prop- 
erty. 

Theorem 1 Under Assumptions\^;\^ if In = Ci[n/ni\~'^ /2 for < k < 1/2, with m defined 
in AssumptionUl then 

with and b defined in Assumptions^ 

Theorem [T] guarantees that all important covariates will be retained by EEScreen with 
high probability. Similar to previous work, we find that this probability bound depends 
only on s„ and not on p„. The bound also depends on m, the order of the U-statistic, so 
that EEScreen may not perform as well for larger m. Theorem [1] is almost an immediate 
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consequence of properties of U-statistics, and the simplicity of the proof is due to the fact 
that EEScreen uses score tests instead of Wald tests. We therefore do not need to estimate 
any parameters, nor prove probabihty inequahties for those estimates, which is a major 
source of technical difficulty in previous work on screening. 

Theorem [1] is most useful if the size of the A4 produced by EEScreen is small. In 
other words, we hope that A4 does not contain too many false positives. With two more 
assumptions, we can provide a bound on \A4\ that holds with high probability. 

Assumption 4 The expected full estimating equation u(/3) is dijferentiable with respect to 
f3. Let the negative Jacobian —du/d(3 be denoted i(/3). 

Assumption 5 There exists some constant C2 > such that \\(3q\\2 < C2- 

Assumption H] can hold even if the sample estimating equation U is nondifferentiable. 
Assumption [5] merely requires that there exist an upper bound on the size of the true I3q 
that does not grow with n, which is a reasonable condition. 

Theorem 2 Under AssumptionsUlWi '^f In = Ci[n/m]~'^/2 as in TheoremU\ then 



\M\ < 



IQcia, 



,2^*2 



cf [n/niY 



-2k 



where and b are defined in AssumptionlE and o-^^^ = supg^^^^^ cTmaxlU^/^o)}? where 
crmax(A) denotes the largest singular value of the matrix A. 

Like Theorem [H Theorem [2] is also almost a simple consequence of properties of U- 
statistics. Theorem [2] provides a finite-sample probability bound on \A4\, but asymptotically 
we would need assumptions on i(/3*) to guarantee that aj^^^^ will not increase too quickly. In 
particular, if crj^ax increased only polynomially in n, \Xi \ would increase polynomially. At the 
same time, the probability that the bound holds tends to one even if logp„ = o([n/m]^^^''), 
so the false positive rate would decrease quickly to zero with prob ability approachi ng one 



even in ultra-high dimensions. A similar phenomenon was found by 
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Fan et al. 



mm 



The presence of crj^^^^ in Theorem |2] reflects the dependence of on the degree of 
colhnearity of our data. For general estimating equations, colhnearity no t only depends on 
the de sign r natrix, but also varies acro ss the parameter space. For example, 



Mackinnon and Puterman 



(|l989|) and 



Lesaffre and Marxi (Il993l ) showed that generalized linear models can be coUinear 
even if their design matrices are not, and vice versa. In our situation, we are concerned with 
colhnearity along the line segment between /3q and 0. Note that because cr^i^^ depends only 
on i, /3q, and 0, which are all nonrandom quantities, cr^^,^ is nonrandom as well. 



2.4 Choosing 7^ 

Theorems [1] and [2] specify o ptimal rates f or 7„ . and a number of methods have been proposed 



for choosing 7„ in practice. 



Fan and Lvl (120081 ) suggested c hoosing 7„, such tha t \ = n — 1 



Zhao and Li 



Zhu et al. 



( 2012 ) showed that 



(120111 ) also recently 



or n/\ogn. Because these values are hard to interpret 
■jn is related to the expected false positive rate of screening, 
proposed a thresholding method based on adding artiflcial auxiliary variables, and provided a 
bound relating the number of added variables to the probability of including an unimportant 
covariate. These methods offer more interpretable ways of choosing how many covariates to 
retain with EEScreen. A related strategy is to set a desired false discovery rate. 



Bunea et al. 



(120061 ) showed th at FDR metho ds can achieve the sure screening property in the ordinary 



linear model, and ISarkarl (12004| ) proposed an FDR method than can also control the false 
negative rate. It would be interesting to pursue this type of idea for EEScreen. 

In practice, however, we are often concerned with the prediction error of the estimator 
obtained by fltting a regularized regression method after EEScreen. If we used the methods 
above we would still need to choose a false positive rate or false discovery rate, but so far 
it is not clear what choices would give optimal prediction. In this case another option is to 
retain different numbers of covariates, flt the regularized regression for each screened model 
jCi, and select the jCi that gives the lowest cross- validated estimate of prediction error. This 
is the approach we take in Section |6l where we use EEScreen to analyze data from a multiple 
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myeloma clinical trail. 



3 Model-free screening 



Zhuetal 



( 120111 ) recently proposed a screening statistic that can achieve sure screening 
for any single-index model. Specifically, for a completely observed response Yi and a p- 
dimensional covariate vector Xj, they assumed that F{y \ Xj) = Fq^i/ \ X.J(3q), where 
F{y I Xj) = P(l^; < y I Xj) and Fq is some distribution function that depends on Xj 
only through the index Xf/3Q, so that j G if and only if /3oj 7^ 0. This is a very mild 



assumption that holds for a large class of models, making the screening method of 



ZhuetaL 



( 1201 ll ) almost model-free. 

To simplify things, they assumed that E(Xj) = and var(Xj) = Ip,^, where Ip„ is the 
Pn X Pn identity matrix. They quantified the marginal relationship between the covariates 
and an outcome y by using the novel statistic 



niy) = E{X,F(|/ I X,)} = cov{X„F(|/ | X,)} = cov{X„/(y; < y)}. 



(6) 



Intuitively, the covariance between Xij and F{Yi \ Xj), where Xij is the j*'* component of Xj, 
should be large in magnitude if j E Ai. They therefore used uj = E{fij(Fj)^} as a measure 
of marginal association, where ^j{y) is the j^^ component of r2(y), leading to the screening 
statistic 

k=l I i=l 

This derivation of the screening procedure of 



Zhu et al. 



(7) 



(120111 ) makes no mention of 



estimation of /3o, making it seemingly irreconcilable with our EEScreen, which requires 
an estimating equation. However, we can actually show that EEScreen, combined with a 
particular estimating equation, leads to a very similar screening procedure. This further 
illustrates the fiexibility and wide applicability of our proposed screening strategy. 
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Note that conditional on Xj and X^, FoiYi \ X.J(3q) and Fq(Yi: \ X^/3q) are independent 
and identically distributed uniform random variables. Therefore, we know that 



P Fo(f, I Xf/3o) < Fo(F, I X^/So 



E 



P Fo(y, I X//3o) < FoiY, I X^/3o) | X„X, 



(9) 



This fact can be used to construct the marginal estimating equations. Consider 



k=l 1=1 



/{Fo(r, I Xf/3) < Fo(n I X^/3)} 



(10) 



Since E{U(/3o)} = Q, ( ITOl) is an unbiased estimating equation for fB^. Furthermore, it is a 
U-statistic of order m = 2, which is covered by the framework of Section 12.31 

It is important to note that ( fTOj) cannot be implemented in practice, because the func- 
tional form of FqIh \ ^^C- (3) is unknown, yet it is still useful for constructing a screening 
procedure. Recall that EEScreen uses the statistic U(0), and for ffTOj) . 



U(0)=n-^J]J]X, 



k=l i=l 

n n 



I{Fo{Y, I XfO) < Fo(n I X^O)} 



n 



5^5^xJ/(r.<n)-- , 

k=l i=l ^ ^ 



'IV. 



(12) 



because Fq^i/ \ XfO) = Fo(y | X|^0) = Fo(?/ | 0), which is a monotonic function since Fq is 
a distribution function. Under the marginal model that /3oj' is the only non-zero parameter, 
the j*'' component of E{U(0)} is cor{Xij,F{Yi \ Xij>)}. Thus the j'*^ component of U(0) 



gives the most powerful score test, so EEScreen with (|TOl) retains parameters 



J ■■ 



n 



k=i i=i ^ ^> 



> In 



(13) 
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or equivalently, 



J 



n 



> In 



(14) 



fc=l i=l 

because the Xj are standardized to have mean 0. In the notation of IZhu et al.l ( 201ll ). this 
is equivalent to using |i?{fij(yi)}| as the screening statistic for the j^^ covariate, rather than 

E{n,(F,)2}. 

The Yi may not be fully observed in the presence of censoring. If Ci are the censoring 
times, let = min(yj,Cj) and 6i = I{Yi < Ci). Then if we assume that the Ci are 
independent of the Yi and Xj, we can see that 



E 



5ri{Yi < n 



Xj, X^ 



E 



E 



E 



E 



I{Yi < Ci)I{Yi < Ck)I{Y, < Yk\ 



Yi, Xj, Xfc 



Sl{Yk)I{Y, < Yk 



SUYi) 
E{I{Yk<Yk) |X„Xfc}, 



Yi, Xj, Xfc 



(15) 

(16) 
(17) 



where Sc is the survival function of the Cj. If the support of the Cj is less than that of the 
Yi, the SciXi) term above could equal for some Fj. Thus this method of accommodating 
censoring could cause difficulty if it were used in th e estimating equation ( !T0|) and could lead 



to inconsistent estimation of fB^ (IFine et al. 



19981 ) ■ For simplicity, we will assume here that 



the support of Cj is greater than or equal to that of Yi. 

This then suggests that in the presence of censoring, the screening statistic of 



ZhuetaL 



(1201 1[ ) should become 



n 



S^I{Y, < Yk 



k=l K i=l ^cO^i) 



and the screening statistic derived using EEScreen should become 



n 



k=l i=l 



S^I{Y, < Yk 



(19) 
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where Sc is the Kaplan-Meier estimate of Sc- This illustrates that the EEScreen 
i s flex ible enough to allow us to derive something similar to the approach of 



"ramework 



Zhu et al. 



(120111 ). which was originally motivated by very different considerations. It also suggests that 
EEScreen can provide a sensible screening procedure for a particular model, such as the 
single-index model, even if the associated estimating equation (|T0|) is not implementable in 
practice. 



4 iEEScreen 



Though the simplicity of EEScreen and related screening procedures is appealing, if the 
covariates are highly correlated, then in flnite samples these univariate screening methods 



may not be able to ac hieve sure screenin^ 



To address this issue, 



Fan and Lvl (120081 ) and 



5 wit hout incurr i ng a 



Fan et al. 



arge number of false positives. 



(120091 ) proposed iterative screening. 



where the general idea is as follows. Below, Mi and Ai denote sets of covariate indices. In 
other words, A4i,Ai C {1, . . . 

1. Set Aio to be the empty set. 

2. For / = 1 : L, 

(a) controlling for the variables in Aii-i, screen the remaining covariates 

(b) select a set Ai of the most important of these covariates 

(c) use a multivariate variable selection method, such as lasso or SCAD, on the co- 
variates in ^Al_l U Al to get a reduced set M.i 

We can adapt these ideas to develop an iterative version of EEScreen, which we will 
call iEEScreen. However, to operationalize iEEScreen and iterative screening algorithms 
in general, we must flrst specify a number of parameters, such as how large \Ai\ and \M.i\ 
shou ld be, wha t multi variate variable selection procedure to use, and how many iterations to 



run. iFan et al 



(1201 ll ) recommended choosing the Ai using a permutation-based procedure. 
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and the A4i using a SCAD-type variable selector (iFan and Lil . 120011 ) with cross-validation. 
Their iterations stop when either \A4i\ > \Ai\, or J^i = Aii-i. These are sensible choices, 

t to ana l yze. 



WolfsonI mm . viewed as a 



but the many different layers of this procedure make it difficu^ 
Instead, here we will show that the EEBoost method of 
variable selector rather than an estimation procedure, can actually be thought of as a version 
of iEEScreen. By linking iterative screening and boosting, we embed iEEScreen in the 
theoretical framework already developed for EEBoost and other boosting methods. In the 
future, this theoretical framework could in turn be applied to analyze the properties of 
iterative screening. 



2OIII ). For some small e > 



We first briefly describe the EEBoost algorithm (jWolfson 
and the full estimating equation U, 

1. Set /3(°) = 0. 



2. For t = 1 : T, 

(a) compute A = \V{f3^'-^^)\ 

(b) identify jt = argmax^ A^, where Aj is the j^^ component of A 

(c) set Z?]*'' = /^j* — e ■ sign(AjJ, where /Sj*'' is the j^^ component of /S*-*^ 

Here, T serves as the regularization parameter, and for a given T only a certain number 



of I3f w i 



1 hav e been updated from their initial values of zero, effecting variable selection. 



WolfsonI ( 120111 ) recommends choosing e in the range [0.001,0.05], and T can be chosen with 
some tuning procedure. 

To express EEBoost as an iterative version of EEScreen, note that at the beginning of 
EEBoost, Aj corresponds to the screening statistic |f/j(0)| used in EEScreen. Evaluating U 
at subsequent /3^*~^'' is a way of controlling for the variables that have already been selected 
into the model by EEBoost, which is step 2(a) of iterative screening. In particular, for 
i = 0,1,... define ti such that ||/3''*'''||o 7^ ||o- In other words, to is the first time 
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that the number of nonzero components of /J*'*^ changes, ti is the second time this happens, 
and so on. Then looking back at the iterative screening algorithm, for / = 1, . . . , L we can 
identify Mi-i to be {j : pf''^^ ^ 0}, Ai to be {jtj, and Mi as being obtained by running 
EEBoost for ti iterations starting from the covariates in A^t,-i U Ai. We can choose L by 
tuning EEBoost with a generalized cross-validation-type criterion. We will thus implement 
iEEScreen using the EEBoost algorithm. 

In the remainder of this paper we study the effects of using EEScreen and iEEScreen as 
preprocessing steps before fitting regularized regression models. In particular, we will use 
EEBoost to fit the regressions, for two reasons. First, we would like to compare the effects 
of retaining different numbers of covariates after screening, from keeping only one or two 
covariates to keeping tens of thousands. Therefore we require a regularization method for 
estimating equations that can handle an arbitrarily large number of covariates. Second, in 
Section 15.21 we study a discrete estimating equation, so we require a regularization method 
which works well in that situation. To our knowledge, EEBoost is the only procedure that 
meets both of these criteria. 

However, this leads to a unique problem. We would naturally like to compare the effects 
of using EEScreen versus iEEScreen. But a careful inspection of the EEBoost algorithm 
reveals that running EEBoost twice, in other words first selecting covariates using EEBoost, 
and then using only those covariates in another instance of EEBoost, is actually identical to 
using EEBoost only once. This means that screening with the version of iEEScreen described 
in this section has no effect if EEBoost is then used for model-fitting. This behavior is 
different from, sa y, the lasso, where running two iterations of the lasso has been termed 



the relaxed lasso ( iMeinshausen 



20071 ) and can give different results from the regular lasso. 
Therefore while we will be able to compare the variable selection properties of EEScreen 
and iEEScreen in simulations, where we will know the true model, we will not be able to 
compare EEScreen-|-EEBoost versus iEEScreen -l-EEBoost. We would like to address this 
issue in future work. 
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5 Simulations 



In our simulation studies, we evaluated the performances of EEScreen and iEEScreen with 
two different estimating equations, one for a t-year survival model and the other for an 
accelerated failure time model. We implemented iEEScreen by using EEBoost, as described 
in Section m with e = 0.01. We compar ed these to the n aive approach of fitting p„ marginal 



regressions, as well as to the method of 
(fT9l) from Section [3l 



ZhuetaL 



( I2OIII ) and our EEScreen-derived method 



We studied p„ = 20000 covariates and set the true parameter vector (3q to be such that 
/3oj = 1.5, J = 1, . . . , 10, f3oj = -0.8, J = 11, ... , 20, and f3oj = 0,j = 21, . . . We 
generated covariates Xj from a p„-dimensional zero- mean multivariate normal. To simulate 
a n easy setting we used a covariance matrix that satisfied the partial orthogonality condition 



of lFan and Song) (j2010l ). where the important covariates were independent of the unimportant 
covariates. The covariance matrix consisted of 9 blocks of 10 covariates, 1 block of 910 
covariates, and 19 blocks of 1000 covariates. Each block had a compound symmetry structure 
with the same correlation parameter p, which was equal to either 0.5 or 0.9, and the blocks 
were independent from each other. We matched the non-zero components of (3q with two of 
the 10-dimensional blocks. To simulate a more difficult setting we let the entire covariance 
matrix have a compound symmetry structure with p equal to either 0.3 or 0.5. 



5.1 The t-year survival model 

We first considered a t-year survival model. Let T, and Xj be the survival time and the 
covariate vector of the i^^ patient, respectively. We modeled the probability of surviving 
beyond some time to conditional on covariates as 

logit{P(T, > to I X,)} = Xf/3o. 
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This model is very useful in clinical investigations, and in fact we apply it to data from 
clinical trials of multiple myeloma therapies in Section |6l 

However, we cannot use the logistic regression because the Tj are not directly observed. 
Let Ci be the censoring time, such that we only observe Yi = min(Tj, Cj) and 6i = I{Ti < Ci). 



Without modeling the Cj, it is difficult to specify a full likelihood mod el for t 
instead turn to estimating equations. To account for the censored data 



lis data, so we 



Jung! (119961 ) assumed 



that the Ci were independent of the Tj and the and proposed using the estimating equation 



U(/3) = n-' 



X,7r'(Xf/3) 



I{Y^ > to) 



- vr(Xr/3) 



-7r(Xf/3){l-7r(Xf^)} [ 5c(tc 



(20) 



where i^ij]) = logit~^(r7), rr'(ri) = dn/di], and Sc{t) is the Kaplan-Meier estimate of the 
survival function of the Ci. According to our procedure, after some simplification we see 
that EEScreen will retain the parameters 



1=1 



liY, > to) 



Sc{to 



> In 



(21) 



Though Uj does not satisfy Assumption [T] because of the Scif) term, Ijungi ( 1996 ) showed 
that it can be written in the appropriate form, plus a negligible op(l) term. To fit the p„ 
regressions for the marginal screening method we used a simple Newton-Raphson procedure 
to solve Vj. 

Tuning EEBoost and iEEScreen was difficult because commonly used criteria such as 
AIC or BIG are not defined in the absence of a likelihood. We instead chose to minimize the 
GCV-type criterion BS/{1 — n^-'^ ||^||o)^, where ||;9||o is the number of nonzero components 
of ^, and BS is the estimate of the Brier score at t p. If 7r(tn | X,;) is the predicted survival 



probability of patient i at to, then BS is defined by 



Graf et al 



(Il999h as 



BS = n-'Yl 



{0-7r(to|X,)r U^il^lM2^T(V>i} 
—I{Yi <to,di = l)-\ 7—— -'(^i > ^oj 



SciX, 



Scito) 



(22) 
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Table 1: Median minimum model size (interquartile range) for the t-year survival model 



Partial orthogonality Compound symmetry 

p = 0.5 p = 0.9 p = 0.3 p = 0.5 



EEScreen 
Marginal 
Zhu et al. (2011) 
Method (2) 



2849 (6180) 
2908 (6278) 
9614.5 (9497.75) 
7559.5 (11737.75) 



22 (249.5) 
22 (228.75) 
2043.5 (7687) 
944.5 (4121.25) 



19666.5 (610.5) 
19659 (611.5) 
19647.5 (655.5) 
19614.5 (716.75) 



19676 (559.5) 
19696 (550.5) 
19531.5 (737) 
19545.5 (726.5) 



Table 2: Average runtime in seconds (standard deviation) for the t-year survival model 

Partial orthogonality Compound symmetry 

p = 0.5 p = 0.9 p = 0.3 p = 0.5 

EEScreen 1.29 (0.09) 1.38 (0.47) 1.38 (0.36) 1.32 (0.16) 

Marginal 617.79 (61.99) 1023.79 (1405.58) 1608.09 (2594.27) 1054.86 (252.32) 
Zhu et al. (2011) 1.52 (0.08) 1.58 (0.45) 1.88 (4.47) 1.49 (0.2) 

Method (2) 1.54 (0.09) 1.58 (0.45) 2.13 (8.02) 1.48 (0.18) 



We generated survival times for n = 100 subjects from log(Tj) = Xf/Sg + Si with Si 
having a lo gistic distribution with mean -0.5 and scale 1. Under this scheme the model of 



Jund ( 1l996l ) is correctly specified. We generated Cj from an exponential distribution to give 
approximately 50% censoring. We observed that the 20*^* percentile of the simulated survival 
times was roughly to = 0.005, so we used this to when implementing the estimating equation. 
We simulated 200 such datasets. 

Table [Hreports the median sizes of the smallest models A4 found by the different screening 
methods that still contained the true model Ai. The performance is best under the partial 
orthogonality setting when p = 0.9, which is not surprising because this setting leads to 
the greatest separation between the important and unimportant covariates. EEScreen and 



marginal screening show s imilar per: 



outperform the method of 



Zhu et al 



'ormances, while our method ( |T9l) appears to actually 



(1201 if ) in the partial orthogonality setting. 
Though EEScreen and marginal screening produce similar results. Table [2] shows that 
marginal screening, at least for this t-year survival model, can take much longer. These 
simulations were run on the Orchestra cluster supported by the Harvard Medical School 
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Research Information Technology Group, on machines with 3.6 GHz Intel Xeon processors 
with at least 12GB of memory, and marginal screening t ook at least 10 m inutes. On the 
other hand, the EEScreen-type methods and the method of l 



Zhuetal 



( 1201 ll ) were completed 



in a few seconds, showing the EEScreen can be much more computationally efficient than 
standard screening methods. 

To better understand the performances of these various screening methods, we studied 
in Figure [1] the average number of false positives corresponding to a given number of false 
negatives achieved by the screened model jCi. We again see that the methods perform best in 
the partial orthogonality setting when the correlation is high. Furthermore, given the same 
setting, EEScreen performs better than the model-free methods. This is most likely because 



the model used by EEScreen is correctly specified, and thus should be m ore power: 



the model-free methods. This type of phenomenon was also pointed o ut by 



As in Table H] our method (fT9i) again appears to outperform that of 



Zhu et al. 



Zhu et al. 



ul than 



( 1201 ll ) 



(201l|). 
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Figure 2: Mean squared errors for the t-year survival model 

Figure [T] also shows that in all cases, the variable selection performance of iEEScreen far 
outperforms the other methods, particularly in the compound symmetry setting. However, 
we found that iEEScreen is not able to include all of the important covariates. In the partial 
orthogonality setting, it can only include up to 17 or 18 of the important covariates, while in 
the compound symmetry setting it cannot achieve fewer than 15 false negatives. It turns out 
that the boosting procedure we use to implement iEEScreen saturates at some point in its 
fitting, perhaps due to the fact that there are more parameters than covariates, or perhaps 
because our choice for the boosting parameter e = 0.01 might be too large. 

Next we studied the effect on estimation accuracy of using screening before fitting a 
regularized regression model with EEBoost. Figure [2] reports the average mean squared 
error of estimation (MSE) as a function of |A^|, the number of variables kept after screening. 
Here we defined MSE as — /JoHi; where ^ is the estimate obtained by EEBoost after 
screening. It is clear that using EEScreen first can improve the estimation accuracy of 
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Figure 3: Out-of-sample AUCs for the t-year survival model 

EEBoost, especially in the compound symmetry setting. Screening with the model- free 
methods does not appear to reduce the MSE, perhaps because they need to retain a large 
number of covariates before they include the important variables (Tabled]). 

On the other hand, estimation error is not so meaningful in the absence of a correctly 
specified model. We therefore considered the out-of-sample predictive ability, as measured by 



the AUG statistic ( lUno et al 



20071 ) at time to, of the models fit by EEBoost after screening 
in Figure |3l In the partial orthogonality settings, using EEScreen first does not appear to 
have much of an effect on the AUG, while in the compound symmetry setting it does improve 
the predictive ability of the subsequent fitted model. Our model-free method flTI?]) does not 
seem to have much of an effect on AUG in either setting, but appears to perform slightly 



better than the method of 



Zhu et al 



201 ih 
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5.2 The accelerated failure time model 

The t-year survival model is useful when we are interested in a fixed event time. To study 
the entire survival distribution, one useful approach is the accelerated failure time (AFT) 
model, which posits that 

log(T,) = Xf/3o + £„ (23) 

where the Si are independent and identically distributed, and the Si can have an arbitrary 
distribution. The f3 can be estimated using the U-statistic-based estimating equation 



U(/3) = n-^ J2^^>^ - X^)/{ei(/3) < ek{f3)}5i 



(24) 



i=l k=l 



where ei(/3) = logiY^) -X.J j3 jTsiatisl . 



1996 



Jin et al. 



2003 



Cai et al. 



20091 ) . Following our 



procedure, after some simplification we see that EEScreen will retain the parameters 



J 



i=l k=l 



> In 



(25) 



This is a U-statistic of order m = 2 and therefore satisfies our assumptions in Section 12.31 
Despite being a discrete estimating equation, fl24|) poses no additional problems to EEScreen 
or iEEScr een. To fit the pn regressions for the marginal screening method we used the 



method of 



Jin et al. 



(l2003l ). available in the R package Iss. 



To tune EEBoost and iEEScreen, consider the function 



L{(3) = n-' J2 - e.(/3)}/{e.(/3) < e,(/3)}<5.. 

i=i j=i 



Cai et al. 



(120091 ) ■ in their work on regularized estimation for the AF T model, 



L[f3) is an adequate measure of the accuracy of estimation. They and 



(26) 



a rgued that 



Jin et al 



fl2003h also 



noted that U(/3) is the "quasiderivative" of —L{f3). For these reasons, we tuned EEBoost 
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Table 3: Median minimum model size (interquartile range) for the AFT model 





Partial orthogonality 


Compound symmetry 




p = 0.5 


p = 0.9 


p = 0.3 


p = 0.5 


EEScreen 


997 (2968.75) 


20 (2) 


19829.5 (316.25) 


19822.5 (401.25) 


Marginal 


1750.5 (3742.25) 


21 (144) 


19835 (353) 


19764 (436) 


Zhu et al. (2011) 


10761.5 (9416) 


747 (3804.5) 


19482 (854) 


19464.5 (922.5) 


Method dH]) 


7940.5 (11962.75) 


282.5 (2230.5) 


19501.5 (800.25) 


19522 (785.75) 



Table 4: Average runtime in seconds (standard deviation) for the AFT model 



Partial orthogonality Compound symmetry 

p = 0.5 /9 = 0.9 p = 0.3 p = 0.5 



EEScreen 


1.58 (0.15) 


1.53 (0.1) 


1.51 (0.1) 


1.51 (0.11) 


Marginal 


1024.71 (114.2) 


971.85 (82.5) 


1081.56 (149.64) 


1203.19 (106.81) 


Zhu et al. (2011) 


1.6 (0.16) 


1.46 (0.11) 


1.44 (0.1) 


1.46 (0.11) 


Method ^ 


1.6 (0.15) 


1.46 (0.11) 


1.44 (0.09) 


1.45 (0.11) 



by minimizing the GCV-type criterion 

mi{l-n-Xm\ (27) 



where we used L{l3) in place of a negative log-likelihood. 

We generated n = 100 survival times from log(Tj) = 'X.ffBQ + Si with Si having a standard 
normal distribution. We generated Ci independently from an exponential distribution to 
give approximately 50% censoring, and we simulated 200 datasets. 

We report for the different screening methods the smallest Xi that still contained Ai 
in Table [21 As with the t-year survival model, the methods perform best in the partial 
orthogona hty setting with p = 0.9. We also again see that our method flT9|) outperforms the 



method of IZhu et al. 



( 120111 ). In addition. Table Hlshows that marginal sc reening is rnuch rn ore 



Zhu et al. 



f!201lh 



time-consuming than the EEScreen-based methods or the procedure of 

Figure H] reports the average number of false positives contained in A4 as the number 
of allowed false negatives is varied. As in the t-year survival model simulations, iEEScreen 
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Figure 4: Screening performances for the AFT model 



performs better than non-iterative EEScreen, though in the compound symmetry case it also 
saturates before it can select all of the important covariates. We also see that the EEScreen 
outperforms t he model-free met hods again, and that our method (fT9|) somewhat outperforms 
the method of IZhu et al.l (120111 ). The plots in Figure H] for the model- free methods look very 
similar to the corresponding ones in Figured], and this is because the models used to generate 
both survival times were both AFT models, differing only in the distributions of the error 
terms. 

The average mean square errors of the models fit after screening are plotted in Figure |5l 
Similar to the results for the t-year survival model, we see that screening using model-free 
methods does not improve the estimation accuracy of the subsequent regularized regression 
fit. Interestingly, for the AFT model it appears that screening with EEScreen only barely 
decreases the MSE under partial orthogonality, and is actually detrimental to the MSE in 
the compound symmetry setting, in contrast to the results for the t-year survival model. 
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Figure 5: Mean squared errors for the AFT model 
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Figure 6: Out-of-sample C-statistics for the AFT model 



27 



We see something similar when we examine the out-of-sample predi ctive abiht i es of the 



201l|) of 



models fit by EEBoost after screening. We calculated the C-statistics ( lUno et al.l . 
the fitted models on independently generated datasets and report them in FigureO EEScreen 
does not have much of an effect on the C-statistic, while using the model-free methods tend 
to decrease the predictive ability of the fitted model. 

The results in Figures E] and E] are in contrast to the corresponding t-year survival sim- 
ulation results, which showed the EEScreen can indeed improve MSE and prediction. This 
may be due to the way these figures were generated: to plot these figures we varied the size 
of Ai from between 400 to 20000 in increments of 400. However, the advantages of screening 
in the AFT setting perhaps may only be seen if fewer than 400 covariates are retained. 



6 Data example 



We illustrate our methods on data from a multiple myeloma clinical trial. Multiple myeloma 
is the second-most common hematological cancer, but despite recent advances in therapy 
the sickest patients have seen little improvement in their prognoses. It is of great interest 
to explore whether genomic data can be used to predict which patients will fall into this 
high-risk subgroup, so that they might be targeted for more aggressive or experimental 
therapies. 

The MicroArray Quality Control Consortium II (MAQC-II) study posed exactly this 
questi on to 36 teams o f analysts representing academic, government, and industrial institu- 



tions (IShi et al. 



20101 ). It used data from newly diagnosed multiple myeloma patients who 



were recruited into clinical trials UARK 98-026 and UARK 2003-33, which studied the treat- 



ment regimes total therapy II ( TT2) and total therapy III (TT3), respectively (jZhan et al. 



20061: 



Shaughnessy et al. 



20071 ). Teams were asked to predict the probability of surviving 



past tn = 24 months, which is ro ughly the median survival time of high-risk myeloma pa- 



tients ( iKyle and Rajkumar 



20081 ). using the TT2 arm as the training set and the TT3 arm 
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Overall survival in TT2 arm 



Overall survival in TT3 arm 




Months 




Figure 7: Kaplan-Meier estimates from multiple myeloma clinical trials 
as the testing set. 

There were 340 patients in TT2, with 126 events and an average follow-up time of 55.82 
months, and 214 patients in TT3, with 43 events and an average follow-up of 37.03 months. 
The Kaplan-Meier estimates of the survival curves are given in Figure [71 Gene expression 
values for 54675 probesets were measured for each subject using Affymetrix U133Plus2.0 
microarrays, and 13 clinical variables were also recorded, including age, gender, race, and 
serum /32-microglobulin and albumin levels. 

Figure [7] shows that there was a patient in TT2 censored before 24 months, so we cannot 
model these data using simple logistic regression. We therefore considered the t-year survival 
model with estimating equation (l20l) . from Section [5.11 Because we had a total of 54688 
covariates and only 340 patients in TT2, we first implemented a scr eening step, whe re we 



ZhuetaL 



(120111). We 



considered EEScreen, our model-free method (lT9l) . and the method ofl 
then fit the screened variables using EEBoost, with the generalized cross-validation criterion 
described in Section ISTTl To choose the size of Ai,we used 5-fold cross-validation and selected 
the value of that gave the best average AUG statistic. The values we considered were 
10, 50, 100, 500, 1000, and the numbers from 5000 to 54688 in increments of 5000. Finally, 
we validated our model in the TT3 arm. 

Table l5l summarizes our results. We first focused on the AUGs estimated using five-fold 
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Table 5: AUCs for probability of surviving past to = 24 months 



Method 


Optimal \M\ 


5-fold CV AUG (SD) 


AUG in TT3 


EEScreen (t-year) 


5000 


0.61 (0.03) 


0.61 


Method (USD 


10 


0.63 (0.06) 


0.58 


Zhu et al. (2011) 


100 


0.67 (0.08) 


0.59 


EEScreen (AFT) 


100 


0.65 (0.08) 


0.70 



cross-validation. Surprisingly, we found that EEScreen gave us the lowest AUG, and that the 
model-free methods required fewer covariates while giving better prediction. However, note 
that screening using the t-year survival estimating equation fl20l) essentially dichotomizes 
the observed times to binary outcomes, because we are only modeling whether they are 
lar ger than t o . In c ontrast, we can see from the forms of method ( JT9l) and the procedure 



of 



Zhu et al. 



( 120111 ) that they use continuous outcomes. We therefore hypothesized that 
the model-free methods had more power than EEScreen based on equation ( l20l) to detect 
covariate effects, even though they did not incorporate any modeling assumptions. 

To test this hypothesis we examined the performance of using EEScreen based on the 
AFT model estimating equation (I24p . This strategy does not dichotomize the survival out- 
comes and is also a more restrictive model than the t-year model because it makes a global 
assumption on the distribution of the survival times. After screening we still used the t-year 
survival model to fit the retained covariates. Indeed, Table [5] shows that with this strategy, 
we needed to retain only 100 covariates to achieve a high AUG. 

Turning now to the validation AUGs calculated in the TT3 arm, we found that though 
the model-free methods gave higher AUGs in cross-validation, their validation AUGs were 
essentially comparable to that of EEScreen based on the t-year survival model. This might 
perhaps indicate that the model-free methods actually overfit to patients in the TT2 arm, 
and thus their results didn't generalize well to patients treated with TT3. In contrast, the 
EEScreen method based on the AFT model gave a much higher validation AUG of 70%. 
The final fitted model contained 37 covariates, which in addition to various gene expression 
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levels also included /32-niicroglobulin, albumin, and lactate dehydrogenase levels. Thus our 
method was able to select important clinical predictors in addition to identifying potentially 
important genomic factors. 



7 Discussion 



In this paper we introduced EEScreen, a new computationally convenient screening method 
that can be used with any estimating equation-based regression method. We proved finite- 
sample performance guarantees that hold for any model that can be fit with U-statistic-based 
estimating equations, and in addition showed that our approac h could be used t o derive a 



Zhuetal 



torn . Finally, 



model-free screening procedure very similar to one proposed by 
we have drawn a con nection betwee n screening and boosting methods, showing that the 
EEBoost algorithm of 



WolfsonI ( 1201 ll ) can be viewed as a form of iterative screening. 
Our simulation results, conducted using a t-year survival model as well as the AFT 
model, support the use of EEScreen in practice. They suggest that EEScreen is capable of 
retaining most of the important covariates without also including too many false positives, 
unless the covariates are very highly correlated. In terms of estimation and prediction, when 
the working model is correctly specified, using EEScreen will usually not give worse results 
than not using screening at all, and at the very least will dramatically reduce the required 
computation time. This does not always appear to be true of the model-free methods. 

On the other hand, in our multiple myeloma example we saw that using different models 
for the screening step and the regression step can offer better performance than keeping 
to one model throughout. This illustrates the difficulty in choosing a default screening 
procedure that works well in all cases. However, our myeloma results suggest that one key 
consideration is the power of the screening step. The AFT model-based screening appeared to 
have greater power than the t-year model, and perhaps its modeling assumptions prevented 
it from overfitting to the TT2 arm, as the model-free methods seemed to do. 
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This insight imphes that different situations will require choosing different screening 
methods in order to achieve the greatest power. Estimating equations give us access to a 
wide range of models to choose from, with more parametric models offering lower variance 
but higher bias, and models with fewer assumptions offering the opposite tradeoff. Thus our 
EEScreen approach is perfectly suited to this screening strategy, offering quick computation 
and good theoretical properties for whichever model we decide to use. 

Acknowledgments 

We thank Professors Lee Dicker and Julian Wolfson for reading an earlier version of this 
manuscript. We also thank Professors Tianxi Cai, Tony Cai, Jianqing Fan, Hongzhe Li, and 
Xihong Lin for their many helpful comments and suggestions. Sihai Zhao is grateful for the 
support provided by NIH-NIGMS training grant T32-GM074897. 

References 

Bunea, P., Wegkamp, M., and Auguste, A. (2006). Consistent variable selection in high 
dimensional regression via multiple testing. Journal of Statistical Planning and Inference 
136, 4349-4364. 

Cai, T., Huang, J., and Tian, L. (2009). Regularized estimation for the accelerated failure 
time model. Biometrics 65, 394-404. 

Candes, E. and Tao, T. (2007). The Dantzig selector: statistical estimation when p is much 
larger than n. The Annals of Statistics 35, 2313-2351. 

Fan, J., Feng, Y., and Song, R. (2011). Nonparametric independence screening in sparse 
ultra-high-dimensional additive models. Journal of the American Statistical Association 
106, 544-557. 



32 



Fan, J. and Li, R. (2001). Variable selection via noncave penalized likelihood and its oracle 
properties. Journal of the American Statistical Association 96, 1348-1360. 

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature 
space. Journal of the Royal Statistical Society, Ser. B 70, 849-911. 

Fan, J., Samworth, R., and Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond 
the linear model. The Journal of Machine Learning Research 10, 2013-2038. 

Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models and 
NP-dimensionality. The Annals of Statistics 38, 3567-3604. 

Fine, J. R, Ying, Z., and Wei, L. J. (1998). On the linear transformation model for censored 
data. Biometrika 85, 980-986. 

Fu, W. (2003). Penalized estimating equations. Biometrics 59, 126-132. 

Gorst-Rasmussen, A. and Scheike, T. H. (2011). Independent screening for single-index 
hazard rate moels with ultra-high dimensional features. Technical Report R-2011-06, 
Department of Mathematical Sciences, Aalborg University. 

Graf, E. A., Schmoor, C., Sauerbrei, W., and Schumacher, M. (1999). Assessment and 
comparison of prognostic classification schemes for survival data. Statistics in Medicine 
18, 2529-2545. 

Hall, W. S. and Newell, M. L. (1979). The mean value theorem for vector valued functions: 
a simple proof. Mathematics Magazine 52, 157-158. 

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal 
of the American Statistical Association 58, 13-30. 

Jin, Z., Lin, D. Y., Wei, L. J., and Ying, Z. (2003). Rank-based inference for the accelerated 
failure time model. Biometrika 90, 341-353. 



33 



Johnson, B. A., Lin, D. Y., and Zeng, D. (2008). Penalized estimating functions and vari- 
able selection in semiparametric regression models. Journal of the American Statistical 
Association 103, 672-680. 

Jung, S.-H. (1996). Regression analysis for long-term survival rate. Biometrika 83, 227-232. 

Kyle, R. and Rajkumar, S. (2008). ASH 50th anniversary review: Multiple myeloma. Blood 
111, 2962-2972. 

Lesaffre, E. and Marx, B. D. (1993). Collinearity in generalized linear regression. Commu- 
nications in Statistics - Theory and Methods 22, 1933-1952. 

Li, G., Peng, H., Zhang, J., and Zhu, L. (2011). Robust sure independence screening for 
ultrahigh dimensional models. Technical Report arXiv:1012.4255v2. 

Mackinnon, M. J. and Puterman, M. L. (1989). Collinearity in generalized linear models. 

Communications in Statistics - Theory and Methods 18, 3463-3472. 

Meinshausen, N. (2007). Relaxed lasso. Computational Statistics & Data Analysis 52, 374- 
393. 

Sarkar, S. K. (2004). FDR-controlling stepwise procedures and their false negatives rates. 
Journal of Statistical Planning and Inference 125, 119-137. 

Shaughnessy, J., Zhan, F., Burington, B. E., Huang, Y., CoUa, S., Hanamura, I., Stewart, 
J. P., Kordsmeier, B., Randolph, C, WiUiams, D. R., et al. (2007). A vahdated gene 
expression model of high-risk multiple myeloma is defined by deregulated expression of 
genes mapping to chromosome 1. Blood 109, 2276-2284. 

Shi, L., Campbell, C, Jones, W. D., et al. (2010). The MAQC-II Project: A comprehen- 
sive study of common practices for the development and validation of microarray-based 
predictive models. Nature Biotechnology 28, 827-838. 



34 



Tibshirani, R. J. (1996). Regression shrinkage and selection via the lasso. Journal of the 
Royal Statistical Society, Ser. B 58, 267-288. 

Tsiatis, A. A. (1996). Estimating regression parameters using linear rank tests for censored 
data. The Annals of Statistics 18, 354-372. 

Uno, H., Cai, T., Pencina, M. J., D'Agostino, R. B., and Wei, L. J. (2011). On the C-statistics 
for evaluating overall adequacy of risk prediction procedures with censored survival data. 
Statistics in Medicine 30, 1105-1117. 

Uno, H., Cai, T., Tian, L., and Wei, L. J. (2007). Evaluating prediction rules for i-year 
survivors with censored regression models. Journal of the American Statistical Association 
102, 527-537. 

van de Geer, S. (1995). Exponential inequalities for martingales, with application to maxi- 
mum likelihood estimation for counting processes. The Annals of Statistics 23, 1779-1801. 

Wolfson, J. (2011). EEBoost: a general method for prediction and variable selection based 
on estimating equations. Journal of the American Statistical Association 106, 296-305. 

Zhan, P., Huang, Y., CoUa, S., Stewart, J., Hanamura, I., Gupta, S., Epstein, J., Yaccoby, S., 
Sawyer, J., Burington, B., et al. (2006). The molecular classification of multiple myeloma. 
Blood 108, 2020. 

Zhao, S. D. and Li, Y. (2012). Principled sure independence screening for Cox models with 
ultra-high-dimensional covariates. Journal of Multivariate Analysis 105, 397-411. 

Zhu, L.-R, Li, L., Li, R., and Zhu, L.-X. (2011). Model-free feature screening for ultrahigh- 
dimensional data. Journal of the American Statistical Association 106, 1464-1475. 



35 



A Proof of Theorem [T] 



The event {Ai C A^} equals {mmj(zM \Uj{0)\ > 7n}, so it is easy to see that 



P(A1 C^)>l-^P(|f/,.(0)|<7n) 



{21 



By the triangle inequality, we know that for all j, \uj{0)\ < \Uj{0) — Uj{0) \ + \Uj{0)\, and by 
Assumption [3] we see that ci[n/m]^'^ — \Uj{0)\ < \Uj{0) — Uj{0)\ for all j E M.. Therefore, 
|f/j(0)| < 7„ for i E M. implies |f/j(0) — ^(O)! > c^\n/mY^'^ /2. We can conc lude from 



Assumptions d] and [2] and Bernstein's inequality for U-statistics (IHoeffding 



19631 ) that 



V{M CM)> 1 -2s„exp 



c\[n/mY~'^'^ /A 
'2S2 + 6ci[n/m]-«/3 



(29) 



B Proof of Theorem [2 



For the marginal estimating equations Uj and their expected values Wj, we kn ow from As 



sumptions d] and [2] and Bernstein's inequality for U-statistics (IHoeffding 



19631 ) that 



[71/777.1^ ^'^/16 

P{max |f/,(0) - 77,(0)1 < Ci[77/7n]-V4} > 1 - 2p„exp \ - , , • 

3 I 2h'^ + bci[n I m\ '^/6) 



(30) 



Also, if maxj|f/j(0) - 77,(0) | < ci[77 /77i] "74, then \Uj{0)\ > 7^ implies that |77,(0)| > 
Ci [77/7/7] ~'^/4. This means that 



l-MI = |{j : \UM\ > 7n}| < |{j : \uM\ > Ci[77/7n]-74}| < 



16 



c1 [n/7r7]^ 



-.Y.^'M'- (31) 



From our EEScreen procedure described in Section [2?T| we see that the 77, (0) are the possibly 
relabeled components of the expected full estimating equation u(0). Thus X]j%(Q)^ — 
||u(0)||2, and by the generalization of the mean value theorem to vector-valued functions 
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(jHall and Newelll . Il979l ) and Assumptions |5] and HJ 



|u(0)||2 = ||u(/3o) - u(0)||2 < sup ||i(t/3o)||2||/3ol|2 < C2 sup a^ax{i(t/3o)} = C2<,,, (32) 

0<t<l 0<t<l 



so that 



\M\< 



CI [n I m] ^'^ 



> P{max||f/,-(0) -M,(0)|U < Ci[n/m]-74} 



> 1 - 2p„ exp 



'2S2 + 6ci[n/m]-'^/6 J ' 



(33) 
(34) 
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